The incidents_house_price dataset from the previous data cleaning exercise is used in this document.
The number of categories were reduced from 39 to 15 by combining common crime types under the same category. This reduction in detail is to help gain key insights while plotting the data.
The number of crimes for each category are shown on a map of San Francisco (using the leaflet library). This provides a spatial visualization of areas with high concentration of crimes by crime-types. The tool tip indicates the category followed by the number of crimes in parenthesis ().
One observes a higher density of crimes in the north-east part of the map. This region belongs to the SOUTHERN, MISSION, CENTRAL and NORTHERN police districts. One possible explanation for the high number of crimes could be due to the large population density. Also, there is more opportunity due to the large number of tourists in this area. The south-east region has a relatively much lower crime density. Note, however, that there are four hot-spots near the south.
In addition to the plot above, it is also instructive to visualize crimes by category. As an illustration, the plots below show the distribution of crimes for the top 2 (Theft and Arson) and bottom 2 (sexual offenses and Weapon) categories. The high larceny/theft rate in the city area is probably due to the increased use of public transportation which makes it convenient for thieves to target. As mentioned previously, the large number of tourists visiting the area are also easy victims. All other crimes can be explained (as a pedantic excercise) but are not further pursued in this document.
The bar plot below provides a relative comparison of the number of crimes for each category. Crimes summed over all the years. We observe that from 2003 to 2018, Theft, Arson, Assault and Burglary were (are) major concerns for San Francisco Police Department. The Other Offenses and Non-criminal categories, that contain several smaller numbers of non-violent crimes, also have a significant contribution.
The heatmap below shows the number of crimes by month and year. It allows for easy visualization of hot-spots, i.e., month-year combinations with high crime-rates. One observes that, there is an increase in crimes from 2013 - 2017 compared to the previous years. Crimes are lower during February, November and December. For the winter months of November and December, the lower population density due to holiday/vacation could explain the lower crimes rates. Tourism is also low around this time. In contrast, the summer months of March to October see a relatively larger number of crimes, probably due to the corresponding increased population density and hence opportunity.
The number of incidents for the top six most frequent categories occuring during a 24 hour period are shown in the plot below. The Time axis corresponds to the 24-hour clock time. Each datapoint corresponds to the total of all the crimes between the years 2003 - 2018, for the corresponding category and time. One can consider the trends in crimes to be divided in 3 distinct time slots:
For each datapoint in the plot above, the plot below provides the breakdown by year. One observes the following:
The bar plot below shows the number of incidents resolved for each PD district. As seen above in the leaflet map, the SOUTHERN, MISSION, CENTRAL and NORTHERN regions have the largest number of crimes, but the SFPD has not been able to resolve most of the cases. Tenderloin is the only police district where the number of resolved cases exceed the number of unresolved cases.
The plot below shows the variation of median house prices and the number of crimes for each PdDistrict. It is to visualize the impact of the crime rate on real estate prices. As can be seen, there is no correlation between the house prices and the corresponding number of crimes. However, there could be a causal relationship, but the data at hand is insufficient to entertain such hypothesis!